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Abstract — Lovasz's bound to the capacity of a graph and the 
the sphere-packing bound to the probability of error in channel 
coding are given a unified presentation as information radii of the 
Csiszar type using the Renyi divergence in the classical- quantum 
setting. This brings together two results in coding theory that are 
usually considered as being of a very different nature, one being 
a "combinatorial" result and the other being "probabilistic". In 
the context of quantum probability, this difference disappears. 

I. Introduction 

One of the central topics in coding theory is the problem 
of bounding the probability of error of optimal codes for 
communication over a given channel. Shannon [ 1 1 introduced 
the notion of channel capacity C, which represents the largest 
rate at which information can be sent through the channel with 
probability of error that vanishes with increasing block-length. 
He then also introduced |2| the notion of zero-error capacity 
■ Co as the largest rate at which information can be sent with 
probability of error precisely equal to zero. For rates in the 
range Co < R < C, the probability of error is known to 
decrease exponentially in the block-length n as 



Pr 



-nE{R) 



(1) 



where E(R) is the so called reliability function of the channel. 
While in the the region of high rates the function E(R) is 
known exactly, in the low rate region little is known about P e ; 
determining both E(R) and Co is an unsolved problem and 
only upper and lower bounds for these quantities are known. 

Two of the most important contributions to the study of 
E(R) and of Co, which came respectively in the '60s and 
in the '70s, are the sphere-packing bound E(R) < E sp (R) 
and Lovasz's bound Co < $ [4|. These two bounds are 
usually considered as being the result of totally unrelated 
methods. In this paper, we show that this is not the case, and 
that Lovasz's result comes as a special case of the sphere- 
packing bound once we move to the more general context 
of classical-quantum channels. In order to do that, we extend 
to the classical-quantum case a result of Csiszar that allows 
us to express the sphere-packing exponent [5 | in terms of an 
information radius using the Renyi divergence. Lovasz's result 
then emerges naturally as a special case. This leads to a unified 
view of two of the most important bounds to E(R) and to 
Co, showing that quantum probability could be the right tool 
to work at the intersection of probaibility and combinatorics. 



II. Classical Channels 

A. Basic notations and definitions 

Let W(x\y), x G X, y G X be the transition probabilities 
of a discrete memoryless channel W : X y, where X and 
y are finite sets. For a sequence x = (x\ , x% , . . . , x n ) G X n 
and a sequence y = (j/i) J/a, . . - , y n ) £ y n , the probability of 
observing y at the output of the channel given x at the input 
is 



iV 



W (n) (y|x) = ]JW(y n \x n ). 



(2) 



A block code with M messages and block-length n is a 
mapping from a set {1, 2, . . . , M} of M messages onto a set 
{xi, X2, . . . , xji/} of M sequences in X n . The rate R of the 
code is defined as R = log M /n. A decoder is a mapping 
from y n into the set of possible messages {1, 2, . . . , M}. If 
message m is to be sent, the encoder transmits the codeword 
x m through the channel. An output sequence y is received by 
the decoder, which maps it to a message m. An error occurs 
if m ^ m. 

Let Y m C y n be the set of output sequences that are 
mapped into message m. When message m is sent, the 
probability of error is 

P e \ m = ^ (n) (y|x m )- (3) 

The maximum error probability of the code is defined as the 
largest P e \ m , that is, 

-^e,max — rTL8X-P e | m . (4) 

m 

Let Pi,max(-R) be the minimum maximum error probability 
among all codes of length n and rate at least R. Shannon's 
theorem [1| states that sequences of codes exists such that 
-Pe,max(-R) — > as n —> oo for all rates smaller than a constant 
C, called channel capacity, which is given by the expression 

W(y\x) 



C = max y P(x)W(y\x) log 



E X 'P(x'W(y\x')- 



(5) 



where the maximum is over all probability distributions on the 
input alphabet. 

For R < C, Shannon's theorem only asserts that 



L (R) — ► as n — > oo. For a range of rates Co < R < C, 



the optimal probability of error 

PI max(R) is known to have 
an exponential decrease in n, and it is thus useful to define 
the reliability function of the channel as 



E(R) = limsup-i logPW (R). 



(6) 



The value Cq is the so called zero-error capacity, also intro- 
duced by Shannon [2|, which is defined as the highest rate 
at which communication is possible with probability of error 
precisely equal to zero. More formally, 

C = SU P {i? : Pj,max(#) = for some n ) ■ ( 7 ) 

For R < Cq, we may define the reliability function E(R) has 
being infinite. Determining the reliability function E(R) (at 
low positive rates) and the zero-error capacity Co of a general 
channel is still an unsolved problem. 

B. Reliability and zero-error capacity 

In order to study the zero-error capacity of a channel, it 
is important to consider when two input symbols or two 
input sequences are confusable and when they are not. Note 
that two input symbols x and x' cannot be confused at the 
output if and only if the associated conditional distribution 
W(-|:r) and W(-|a; / ) have disjoint supports. Furthermore, two 
sequences x = (x%, . . . , x n ) and x' = (x[, . . . , x' n ) cannot 
be confused if and only if there exists at least one index 
i such that symbols £j and x\ are not confusable. For a 
given channel W, it is then useful to define a confusability 
graph G(W) whose vertices are the elements of X and whose 
edges are the elements (x, x') G X 2 such that x and x' are 
confusable. It is then easily seen that Co only depends on 
G(W). Furthermore, for any G, we can always find a channel 
W such that G(W) = G. Thus, we may equivalently speak 
of the zero-error capacity of a channel W or of the capacity 
C(G) of the graph G if G = G(W), and we will use those 
two notions interchangeably through the paper. 

A first upper bound to Co was obtained by Shannon (2), 
who upper bounded Co with the zero-error capacity Cf b when 
perfect feedback is available. He could prove by means of a 
combinatorial argument that, if Co > 0, then 

Cfb = max — log max P{ x )- (8) 

p y L-J 

x:W(y\x)>0 

Given a graph G, then, the best bound to C(G) is obtained 
by using the channel W' with G(W) — G which minimizes 
Cfb- Interestingly enough, this bound can also be obtained by 
a rather different method that relies on bounding the reliability 
function E(R). In particular, the so called sphere-packing 
bound, first derived in [6 | and later rigorously proved in []3], 
states that E(R) < E sp (R), where E sp (R) is defined by 

E sp (R) > sup [E (p)-pR] 

p>0 
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max Eq (p, P) 



e (p,p) = -\ogj2(J2 p ^ w (y\ x y /{1 ~ 



■p) 



The function E sp (R) is finite for all rates R larger than the 
quantity 

i?oo = max — log max } P(x), (9) 

P y /—/ 

x: W(y|x)>0 

which implies that E{R) is finite for R > Roo and thus that 
Co < Roo- Interestingly enough, we see that if Co > then 
^oo = Cfb- This implies that in all cases of practical interest, 
Shannon's bound to Co, which was first derived by means of 
a combinatorial method, can also be deduced from the sphere- 
packing bound, which is instead derived in a probabilistic 
setting. 

A major breakthrough came with Lovasz's 1979 work [@]. 
Given a confusability graph G, Lovasz calls an orthonormal 
representation of G any set {u x } x( zx of unit norm vectors 
in any Hilbert space such that u x and u x i are orthogonal if 
symbols x and x' are not confusable. We will use here the bra- 
ket notation (a\b) for the scalar product between two vectors 
a and b. He then defines the value of a representation {u x } 

F(K})=mmmaxlog^^, (10) 

where the minimum is over all unit norm vectors c. The 
vector c that achieves the minimum above is called the handle 
of the representation. Lovasz shows that any orthonormal 
representation satisfies V({u x }) > Co- Optimizing over all 
representations, he thus gives a bound for Co in the form 
Co < i?, where 

■d = min min max log j-. — 

{«x} c x \{u x \c)\ z 

is the so called Lovasz theta function. This result is usually 
considered of a purely combinatorial nature and no proba- 
bilistic interpretation seems to have emerged up to now. It 
is interesting to note, however, that a possible representation 
for the confusability graph of a channel W can simply be 
constructed by taking the set of \y\ -dimensional real valued 
vectors {ip x } with components ip x (y) = y/W(y\x). As we 
will show later, the value of this representation V({(p x }) 
is precisely the cut-off rate of the channel, which is never 
smaller than Co- Clearly, using different channels W' (with 
G(W) — G(W)), we may upper bound Co with the lowest of 
their cut-off rates. Nicely enough, it turns our that this would 
lead precisely to same upper bound obtained by means of Cfb 
(or Roo)- Lovasz's theta function achieves a smaller upper 
bound to Co due to the fact that it allows the components 
of the vectors of a representation to take on negative values. 
Lovasz's approach seems thus to suggest bounding the zero- 
error capacity by considering the use of quantum-theoretic 
wave functions in place of classical probability distributions. 

'We use a logarithmic version of the theta function so as to make its 
comparison with rates more straightforward. 



C. Renyi' s Information Radii 

In is known |7| that the capacity of a classical channel can 
be written as an information radius according to the expression 

C = mhxmaxD(W(-\x)\\Q), (11) 

Q x 

where -D(-||-) is the Kullback-Leibler divergence. This min- 
max formulation was extended by Csiszar [8] to describe the 
reliability function in the high rate region. Here, since we 
are only interested in upper bounds to E(R), it is useful 
to consider the sphere-packing exponent E sp (R), for which 
Csiszai's min-max expression holds with full generality. The 
function E sp (R) equals the upper envelop of all the lines 
Eo(p) — pR, and an important quantity is the value R p = 
Eo (p)/p at which each of these lines meets the R axis^l Given 
two distributions Qi and Q2 on the channel output y, define 
the Renyi divergence of order a e (0, 1) of Qi from Q2 as 

A»(Qi||Q 2 ) = -^\o g Y J Qi(y) a Q2(y) 1 - a - (12) 

a — 1 * — ' 

y 

It is then shown in (8] Prop. 1] that 

R p = mmmaxDjm-laOHQ), a = 1/(1 + p). (13) 

Q x 

Using the known properties of the Renyi divergence (see [8|), 
we find that when p — > the above expression (with a — > 1) 
gives the already mentioned expression for the capacity (fTTT ). 
while for p^oowe obtain 

Rao = minmax-log ^ Q(y)> (14) 

y.W(y\x)>0 

which is the dual formulation of (0. 

It is evident that there is an interesting similarity between 
the min-max expression for R p of a channel W and the 
value of a representation in Lovass's sense. In the next 
sections, we will show that this similarity is not a simple 
coincidence. Lovasz's bound to Co and the sphere-packing 
bound to E(R) are based on the very same idea and can be 
described in a unified way in probabilistic terms in the context 
of quantum probability. By considering the extension of the 
sphere-packing bound to classical-quantum channels, we will 
show that Lovasz's bound emerges naturally, in that case, as 
a consequence of the bound Co < Roo- 

Remark 1: A very nice fact, apparently not reported in the 
literature, is that the usual cut-off rate of a classical channel 
W, evaluated according to equation (TT31 with a = 1/2, is 
precisely the value V({ip x }) of the representation {(p x } with 
'fix — y/W(-\x). In this paper, however, we will interpret 
Lovasz's value of a representation {u x } in relation to the 
rate R^ of a pure-state classical-quantum channel with state 
vectors \u x ). It turns out [9 | that the cut-off rate of a classical 
channel W precisely equals the rate R^ of a pure-state 
classical-quantum channel with state vectors \ip x ) as defined 
above, but the true reason for this equivalence is not yet clear. 

2 Here, since we also consider the true zero-error capacity Co, we do not 
adopt Csiszar's notation of channel capacity of order a. 



III. Classical-Quantum Channels 
A. Basic notions and the sphere-packing bound 

We introduce here the minimal notions and results on 
classical-quantum channels so as to make this paper as self- 
contained as possible. The interested eader may refer to [ 1 1 
ifTTl for more details. 

Following lfl2l . consider a classical-quantum channel with 
a finite input alphabet X with associated density operators S x , 
x 6 X in a finite dimensional Hilbert spac^l %■ The n-fold 
product channel acts in the tensor product space H® n of n 
copies of H. To a codeword x = (xi, x 2 , . . . , x n ) is associated 
the signal state S x = S Xl 8> S X2 ■ ■ ■ ® S Xn . A block code 
with M codewords is a mapping from a set of M messages 
{1, . . . , M } into a set of M codewords xi, . . . , Xj\/. The rate 
of the code is defined as R = i^sil. 

n 

A quantum decision scheme for such a code is a so-called 
POVM (see for example (HD), that is, a collection of M 
positive operators^ {III, LI2, . . . , LIm} such that n m < 1, 
where 1 is the identity operator. The probability that message 
m' is decoded when message m is transmitted is P(m'\m) = 
TrII m /S Wm . The probability of error after sending message 
in is 

P e | m = l-Tr(LI m S Xm ). (15) 

We then define P e .max, ^e,max(-R), C, Co and E(R) precisely 
as in the classical case. 

Focusing on Co, we first show that, as in the classical case, 
we can still express Co as the capacity of a confusability graph 
where, in this case, two input symbols are confusable if and 
only if Tr(S x S X ') > 0. In fact, if a code with M codewords 
satisfies P e ,max = 0, then for each to 7^ m' we must have 
Tr(n m S x ,J = 1 and Tr(II m S Xm ,) = 0. This is possible if 
and only if the signals S Xm and S x , are orthogonal, that 
is Tr(S Xm S Xm ,) = 0. But, using the property that Tr((A <g> 
B)(C ® D)) = Tr(AC) Tr(BD), we have 

n 

Tr(S Xm S Xm ,) =l[Tr(S Xmii S Xm ,J. (16) 

i=l 

This implies that Tr(S Xm , .) = for at least one value 
of i. Thus, evaluating the zero-error capacity in the classical- 
quantum setting amounts to evaluating the capacity of a graph 
as defined in the previous section. In this sense, there is no 
difference between classical and classical-quantum channels 
and, given a graph G, we can interpret the capacity C(G) as 
either the zero error capacity Co of a classical or of a classical- 
quantum channel with that confusability graph. 

For classical-quantum channels, bounds to the reliability 
function E(R) have been developed which partially match 
those of the classical case. Lower bounds to the reliability 
function where obtained in 1T31 and lfT2l . while upper bounds 

3 The S x can thus be represented as positive semi-definite Hermitian 
matrices with unit trace. 

4 The operators II m can thus be represented as positive semi-definite 
matrices. The notation Yl m < 1 simply means that 1 — J"] II m is 
positive semidefinite. Note that, by construction, all the eigenvalues of each 
operator Yl m must be in the interval [0, 1]. 



have remained relatively unexplored until recently. For general 
R > 0, the first upper bound to E(R) was obtained in as 
an extension of the classical sphere-packing bound of [3 1. The 
bound can be stated as follows. 

Theorem 1 (Sphere Packing Bound fi5§H^): For all posi- 
tive rates R and all positive e < R, 



E(R) < E sp (R-e), 
where E sp (R) is defined by the relations 

E sp (R) 

Eo(p) 



sup [E (p) - pR] 
max£ (ft P) 



i+p 



E (p,P) = -logTr 



E p w^ /(1+P) 



(17) 

(18) 
(19) 

(20) 



B. Quantum Renyi's Information Radii 

We now extend Csiszar's result to give a characterization 
of the sphere packing bound for classical-quantum channels 
in terms of Renyi's information measures. Given two density 
operators F\ and F2 in H, define the Renyi divergence of 
order a of F\ from F2 as 

D a (F 1 \\F 2 ) = ^—logTvF^F 2 1 - a . (21) 
a — 1 

As in the classical case, let then R p = E^(p)/p. Then we 
have the following result. 

Theorem 2: For a classical-quantum channel with states S x . 
x G X, the rate R p defined above satisfies 

R„ = min max /^(S^ I a = 1/(1 + p). 

F x 

Proof: Setting a — 1/(1 + p), we can write 

l/a 



(22) 



Ro 



1 



max — 

p a 



1 



log 



and, defining A(a, P) = J2 X P(x)S x , we can write 

J \og\\A(a,P)\\ 1/a , 



Ro 



max — 

P a-1 



(23) 



(24) 



where || • || r is the Schatten r-norm. From the Holder inequality 
we know that, for any positive operators A and B, we have 

\\A\\ 1/a \\B\\ 1/{1 _ a) >Tv(AB) (25) 

with equality if an only if B = r yA 1 ~ 1 / a for some scalar 
coefficient 7. Thus we can write 

\\A\\ 1/a = max Tr(AB), (26) 

\\ B \\l/(l-c.)<l 

where B runs over positive operators in the unit ball in the 
(1/(1 — a))-norm. Using this expression for the Schatten norm 
we obtain 

' log _ max Tr(A(a, P)B) (27) 



Ro 



max — 

P a 



1 



max 

||B||i /( i_ Q) <l 



^— log min max Tr ^P(x)S^ . 

^ ll^lll/(l-c,)<l Y x ) 

(28) 



In the last expression, the minimum and the maximum are 
both taken over convex sets and the objective function is linear 
both in P and B. Thus, we can interchange the order of 
maximization and minimization to get 

R P = — — T log „ n|| max min ^ P(x) Tr (S"B) (29) 



-log max minTr(S'"B) 

a - 1 ||B||i /C i- a) <i ^ 



(30) 



Now, we note that the maximum over B can always be 
achieved by a positive operator, since all the S x are positive 
operators. Thus, we can change the dummy variable B with 
F = B 1 ! ( 1-Q ), where F is now a positive operator constrained 
to satisfy < 1, that is, it is a density operator. Using F, 

we get 



1 



R„ = log max min Tr (S"F 

a — 1 Fx 

= min max log Tr ( S^F 1 

F x a - 1 v 

= min max ZJafS'z I l-F). 

F x 



l-a\ 



(31) 

(32) 
(33) 



where F now runs over all density operators. ■ 
It is obvious that, if all operators S x commute, which means 
that the channel is classical, than the optimal F is diagonal in 
the same basis where the S x are, and we thus recover Csiszar's 
expression for the classical case. Furthermore, for p — > (that 
is, a — > 1) we obtain the expression of the capacity as an 
information radius already established for classical-quantum 
channels |14|. When p = 1 (that is, a = 1/2) then, we obtain 
an alternative expression for the so called quantum cut-off 
rate [15|. The most important case in our context, however, is 
the case when p — » 00 (that is, a — > 0). Taking the limit in 
Theorem [2] letting S x be the projector in the subspace of S x , 
we obtain 

R x = mm max log Tr ) , (34) 

where the minimum is again over all density operators F. The 
analogy with the Lovasz theta function now becomes evident 
if we consider a special case of this expression. Assume that 
the states S x are pure and set S x — \u x )(u x \. Consider for 
a moment the search for the optimum F when restricted to 
rank-one operators, that is F = |/)(/|. We see that in this 
case we can write Ti^S^F) = | (it.,; |/}| 2 . When searching over 
all possible F, we thus find that for this channel we have 



Roc < V({u x }). 



(35) 



Hence, we see that Lovasz's bound Co < V({u x }) can be 
deduced as a consequence of the sphere-packing bound. Of 
course, for a given graph G, one may want to bound C(G) 
with the smallest R^ over all channels with confusability 
graph G. This leads to a bound to Co, discussed in the next 
section, which generalizes - at least formally - the Lovasz theta 
function. 



IV. A Sphere-Packed Lovasz Theta Function 



Hence, we have 



For a given confusability graph G, inspired by (1341 . we 
define a representation of G any set of projectors {U x } such 
that U X U X > = if symbols x and x' cannot be confused. 
Furthermore, we introduce an alternative definition of value 

V sp ({U x }) = nfinmaxlog — — , (36) 

where the minimum is over all density operators F. The 
optimal F will be called again the handle of the representation. 
We then finally define the quantity0 

# sp = mmminmaxlog^, (37) 

where {U x } runs over all representations of the graph G. We 
then have the following result. 

Theorem 3: For any graph, we have 

C(G) < $ sp < 0. (38) 

Proof: The fact that i9 sp < i3 is obvious, since Lovasz's 
•d is obtained by restricting the minimization in the definition 
of -d S p to rank-one projectors U x = \u x )(u x \ and handle 
F = |/)(/|. That C < $ sp should be clear in light of 
the above discussion on the bound E(R) < E sp (R). It is 
instructive, however, to present an alternative, self-contained 
proof that does not involve the function E sp (R). This can 
be done along the same argument used by Lovasz but in the 
more general situation where general projector operators are 
used for the representation in place of Lovasz's vectors, and 
a general density operator is used as the handle. 

Consider an optimal representation {U x } and, to a sequence 
of symbols x = (xi, X2, ■ ■ ■ , x n ), associate the operator 
(projector) U x = U Xl ® U X2 • • • ® U Xn . Consider then a zero- 
error code with M codewords of length n, xi, . . . , Xj\/, and 
their associated projectors U Xl , . . . , U XJU . Then, as proved 
before, for to ^ m! we have Tr(U Xm U x , ) = 0. Hence, 
since the states {U Xm } are orthogonal projectors, we clearly 
have 

M 

^ U Xm < 1, (39) 

where 1 is the identity operator. Consider now the state F = 
p®n w jj ere p j s the handle of the representation {U x }. Note 
that, for each to, we have 

n 

Tr(U Xm F) =l[Ti(U Xmti F) 

i=l 

5 As far as we could understand, the defined quantity does not have an 
evident connection with the results of 1161 . There, the authors extend the 
algebraic definition of the Lovasz theta function to consider what they call 
non-commutative graphs. Our definition is instead intended for the classical 
notion of confusability graph. 



1 = Tr(F) (40) 

M 

> ^Tr(U Xm F) (41) 

m— 1 

> Me"*". (42) 

Thus, we deduce that M < e n ' dsp . This implies that, for any n, 
the rate of a zero-error code of length n, and thus the capacity 
of the graph Co, is not larger than fl sp . ■ 
One may wonder whether there exists a graph for which 
< We have not yet found an answer to this question, 
which is the objective of an ongoing study. It may be worth 
pointing out that, even if we restrict the search to optimal rank- 
one representations as in the Lovasz's case, it is not obvious 
what happens by allowing F to have rank larger than one. For 
example, experimental evidence shows that, for a rank-one 
representation with states U x = \u x )(u x \, the optimal handle 
has in some cases rank larger than one if (u x \u x i) < for 
some x, x' (see (9J for more details). What we have not yet 
determined, however, is whether this can happen for an optimal 
representation achieving d sp . It is worth observing that even 
experimentally testing specific graphs is not a simple task, 
since we have not yet found an efficient method to evaluate 

o s P - 
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